A Study of Parentheticals in Discourse Corpora - Implications for NLG Systems
نویسندگان
چکیده
This paper presents a corpus study of parenthetical constructions in two different corpora: the Penn Discourse Treebank (PDTB, (PDTBGroup, 2008)) and the RST Discourse Treebank (Carlson et al., 2001). The motivation for the study is to gain a better understanding of the rhetorical properties of parentheticals in order to enable a natural language generation system to produce parentheticals as part of a rhetorically well-formed output. We argue that there is a correlation between syntactic and rhetorical types of parentheticals and establish two main categories: ELABORATION/EXPANSION-type NP-modifier parentheticals and NON-ELABORATION/EXPANSION-type VPor S-modifier parentheticals. We show several strategies for extracting these from the two corpora and discuss how the seemingly contradictory results obtained can be reconciled in light of the rhetorical and syntactic properties of parentheticals as well as the decisions taken in the annotation guidelines.
منابع مشابه
The Penn Discourse TreeBank as a Resource for Natural Language Generation
While many advances have been made in Natural Language Generation (NLG), the scope of the field has been somewhat restricted because of the lack of annotated corpora from which properties of texts can be automatically acquired and applied towards the development of generation systems. In this paper, we describe how the Penn Discourse TreeBank (PDTB) can serve as a valuable large scale annotated...
متن کاملA Cross-linguistic and Cross-cultural Study of Epistemic Modality Markers in Linguistics Research Articles
Epistemic modality devices are believed to be one of the prominent characteristics of research articles as the commonly used genre among the academic community members. Considering the importance of such devices in producing and comprehending scientific discourse, this study aimed to cross–culturally and cross-linguistically investigate epistemic modality markers as an important subcategory...
متن کاملGenre Analysis of ELT and Nursing Academic Written Discourse through Introduction
Since Swales’ (1981, 1990) CARS model work on the move structure of research articles, studies on genre analysis have been carried out amongst which works on different parts of research articles in various disciplines has gained a considerable literature. This study aims to investigate the rhetorical structure of the Introduction sections of articles in two fields of English Language Teaching (...
متن کاملThe multifunctionality of epistemic parentheticals in discourse : prosodic cues to the semantic-pragmatic boundary
The aim of this study is to identify the relation between the interpretation of epistemic parentheticals in discourse and their prosodic realisation . Data drawn from a corpus of British English speech suggests that epistemic parentheticals (comment clauses such as I think, J believe) convey a spectrum of meaning from propositional to interpersonal. They have long been categorised simply as sen...
متن کاملWhat is in a text and what does it do: Qualitative Evaluations of an NLG system - the BT-Nurse - using content analysis and discourse analysis
Evaluations of NLG systems generally are quantiative, that is, based on corpus comparison statistics and/or results of experiments with people. Outcomes of such evaluations are important in demonstrating whether or not an NLG system is successful, but leave gaps in understanding why this is the case. Alternatively, qualitative evaluations carried out by experts provide knowledge on where a syst...
متن کامل